Automatic Error Recovery for Pronunciation Dictionaries

نویسندگان

  • Tim Schlippe
  • Sebastian Ochs
  • Ngoc Thang Vu
  • Tanja Schultz
چکیده

In this paper, we present our latest investigations on pronunciation modeling and its impact on ASR. We propose completely automatic methods to detect, remove, and substitute inconsistent or flawed entries in pronunciation dictionaries. The experiments were conducted on different tasks, namely (1) word-pronunciation pairs from the Czech, English, French, German, Polish, and Spanish Wiktionary [1], a multilingual wiki-based open content dictionary, (2) our GlobalPhone Hausa pronunciation dictionary [2], and (3) pronunciations to complement our Mandarin-English SEAME code-switch dictionary [3]. In the final results, we fairly observed on average an improvement of 2.0% relative in terms of word error rate and even 27.3% for the case of English Wiktionary word-pronunciation pairs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wiktionary as a source for automatic pronunciation extraction

In this paper, we analyze whether dictionaries from the World Wide Web which contain phonetic notations, may support the rapid creation of pronunciation dictionaries within the speech recognition and speech synthesis system building process. As a representative dictionary, we selected Wiktionary [1] since it is at hand in multiple languages and, in addition to the definitions of the words, many...

متن کامل

Automatic Generation of Pronunciation Dictionaries

In this report we will describe a data driven approach for creating pronunciation dictionaries for a new unseen target language by voting among phoneme recognizers in nine different languages other than the target language. In this process recordings of the new language that are transcribed on word level are decoded by the phoneme recognizers. This results in a hypothesis of nine phonemes per t...

متن کامل

Malay Grapheme to Phoneme Tool for Automatic Speech Recognition

This paper presents the design and performance of a Malay grapheme to phoneme (G2P) tool for generating the pronunciation dictionary for a Malay automatic speech recognition system (ASR). The G2P tool is a rule based system. It is flexible in adding and removing rules, and handling of English words. The G2P tool also contains morphological and syllable tool, which it uses to determine the pronu...

متن کامل

Efficient compression method for pronunciation dictionaries

Pronunciation dictionaries are often used with other datadriven methods to model the pronunciations in phonemebased automatic speech recognition (ASR) and text-to-speech (TTS) systems. The dictionaries usually take a great amount of memory, which is a limiting factor in portable handheld devices. Compressing the pronunciation dictionaries results in minimal transmission bandwidth and less stora...

متن کامل

Automatic Learning and Optimization of Pronunciation Dictionaries

Pronunciation dictionaries are the interface between orthographic and phonetic representation of the speech signal and are thereby a substantial component of speech recognition systems. In many systems simple canonical pronunciation forms are used within the dictionary. They represent the “correct” pronunciation as they are found in lexicons and neither contain the most frequent pronunciation n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012